About CUDA

CUDA

Basics

COMPUTE UNIFIED DEVICE ARCHITECTURE
Used to expose the computational horsepower of NVIDIA GPUs for GPU Computing
It is scalable across any number of threads

Software

Based on industry-standard C
Small set of extensions to C language
Low learning curve
Straightforward APIs to manage devices, memory, etc.

Terminology

Host -The CPU and its memory
Device - The GPU and its memory
Kernel - Function compiled or the device and it is executed on the device with many threads

Pre-requisites

You (probably) need experience with C or C++
You do not need any GPU experience
You do not need any graphics experience
You do not need any parallel programming experience

Why CUDA?

Data Parallesim -Program property where arithmetic operations are simultaneously performed on data structures.

A 1,000 X 1,000 matrix multiplication:

1,000,000 independent dot products.
Each 1,000 multiply & 1,000 add arithmetic operations.

Thread Creation : CUDA threads light weight than CPU threads.

Take few cycles to generate and schedule due to efficient hardware support.
CPU threads typically take thousands of clock cycles to generate and schedule.

It avoids performance overhead of graphics layer APIs by compiling software directly to hardware (GPU assembly lang).

Example of CUDA processing flow

Copy data from main memory to GPU memory
CPU instructs the process to GPU
GPU execute parallel in each core
Copy the result from GPU memory to main memory

How to Write

Create or edit the CUDA program with your favorite editor. Note: CUDA C language programs have the suffix ".cu".
Compile the program with nvcc to create the executable.
Run the executable.

Copyright © 2014 - All Rights Reserved - Domain Name

Template by OS Templates